Goto

Collaborating Authors

 single-step retrosynthesis


Accelerating the inference of string generation-based chemical reaction models for industrial applications

Andronov, Mikhail, Andronova, Natalia, Wand, Michael, Schmidhuber, Jürgen, Clevert, Djork-Arné

arXiv.org Artificial Intelligence

Template-free SMILES-to-SMILES translation models for reaction prediction and single-step retrosynthesis are of interest for industrial applications in computer-aided synthesis planning systems due to their state-of-the-art accuracy. However, they suffer from slow inference speed. We present a method to accelerate inference in autoregressive SMILES generators through speculative decoding by copying query string subsequences into target strings in the right places. We apply our method to the molecular transformer implemented in Pytorch Lightning and achieve over 3X faster inference in reaction prediction and single-step retrosynthesis, with no loss in accuracy.


Molecule-Edit Templates for Efficient and Accurate Retrosynthesis Prediction

Sacha, Mikołaj, Sadowski, Michał, Kozakowski, Piotr, van Workum, Ruard, Jastrzębski, Stanisław

arXiv.org Machine Learning

Retrosynthesis involves the strategic breakdown of complex molecules into simpler precursors, paving the way for the synthesis of novel molecules. Recently, there has been a development of AI-based methods for retrosynthesis, which allow learning reaction rules from the data of historically performed reactions. A central component of such systems is a model for single-step retrosynthesis that predicts what reactions could lead to a considered target molecule. Two dominant methodologies are used for single-step retrosynthesis. Template-based methods use a set of translation rules that represent the possible chemical transformations. Although these methods are characterized by speed and interpretability, they may require an extensive set of templates to cover a large space of chemical reactions, which limits their generalization capacity. Conversely, template-free approaches can produce arbitrary reactions without such constraints but are often computationally demanding, largely due to their dependency on autoregressive decoding [1, 2, 3, 4].


Evolutionary Retrosynthetic Route Planning

Zhang, Yan, Hao, Hao, He, Xiao, Gao, Shuanhu, Zhou, Aimin

arXiv.org Artificial Intelligence

Molecular retrosynthesis is a significant and complex problem in the field of chemistry, however, traditional manual synthesis methods not only need well-trained experts but also are time-consuming. With the development of big data and machine learning, artificial intelligence (AI) based retrosynthesis is attracting more attention and is becoming a valuable tool for molecular retrosynthesis. At present, Monte Carlo tree search is a mainstream search framework employed to address this problem. Nevertheless, its search efficiency is compromised by its large search space. Therefore, we propose a novel approach for retrosynthetic route planning based on evolutionary optimization, marking the first use of Evolutionary Algorithm (EA) in the field of multi-step retrosynthesis. The proposed method involves modeling the retrosynthetic problem into an optimization problem, defining the search space and operators. Additionally, to improve the search efficiency, a parallel strategy is implemented. The new approach is applied to four case products, and is compared with Monte Carlo tree search. The experimental results show that, in comparison to the Monte Carlo tree search algorithm, EA significantly reduces the number of calling single-step model by an average of 53.9%. The time required to search three solutions decreased by an average of 83.9%, and the number of feasible search routes increases by 5 times.


Models Matter: The Impact of Single-Step Retrosynthesis on Synthesis Planning

Torren-Peraire, Paula, Hassen, Alan Kai, Genheden, Samuel, Verhoeven, Jonas, Clevert, Djork-Arne, Preuss, Mike, Tetko, Igor

arXiv.org Artificial Intelligence

Retrosynthesis consists of breaking down a chemical compound recursively step-by-step into molecular precursors until a set of commercially available molecules is found with the goal to provide a synthesis route. Its two primary research directions, single-step retrosynthesis prediction, which models the chemical reaction logic, and multi-step synthesis planning, which tries to find the correct sequence of reactions, are inherently intertwined. Still, this connection is not reflected in contemporary research. In this work, we combine these two major research directions by applying multiple single-step retrosynthesis models within multi-step synthesis planning and analyzing their impact using public and proprietary reaction data. We find a disconnection between high single-step performance and potential route-finding success, suggesting that single-step models must be evaluated within synthesis planning in the future. Furthermore, we show that the commonly used single-step retrosynthesis benchmark dataset USPTO-50k is insufficient as this evaluation task does not represent model performance and scalability on larger and more diverse datasets. For multi-step synthesis planning, we show that the choice of the single-step model can improve the overall success rate of synthesis planning by up to +28% compared to the commonly used baseline model. Finally, we show that each single-step model finds unique synthesis routes, and differs in aspects such as route-finding success, the number of found synthesis routes, and chemical validity, making the combination of single-step retrosynthesis prediction and multi-step synthesis planning a crucial aspect when developing future methods.


Modeling Diverse Chemical Reactions for Single-step Retrosynthesis via Discrete Latent Variables

He, Huarui, Wang, Jie, Liu, Yunfei, Wu, Feng

arXiv.org Artificial Intelligence

Single-step retrosynthesis is the cornerstone of retrosynthesis planning, which is a crucial task for computer-aided drug discovery. The goal of single-step retrosynthesis is to identify the possible reactants that lead to the synthesis of the target product in one reaction. By representing organic molecules as canonical strings, existing sequence-based retrosynthetic methods treat the product-to-reactant retrosynthesis as a sequence-to-sequence translation problem. However, most of them struggle to identify diverse chemical reactions for a desired product due to the deterministic inference, which contradicts the fact that many compounds can be synthesized through various reaction types with different sets of reactants. In this work, we aim to increase reaction diversity and generate various reactants using discrete latent variables. We propose a novel sequence-based approach, namely RetroDVCAE, which incorporates conditional variational autoencoders into single-step retrosynthesis and associates discrete latent variables with the generation process. Specifically, RetroDVCAE uses the Gumbel-Softmax distribution to approximate the categorical distribution over potential reactions and generates multiple sets of reactants with the variational decoder. Experiments demonstrate that RetroDVCAE outperforms state-of-the-art baselines on both benchmark dataset and homemade dataset. Both quantitative and qualitative results show that RetroDVCAE can model the multi-modal distribution over reaction types and produce diverse reactant candidates.